Search CORE

180 research outputs found

MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.

Author: Hauser M.
Steinegger M.
Söding J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2016
Field of study

Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence databases by similarity clustering improves speed and sensitivity of iterative searches. But existing tools cannot efficiently cluster databases of the size of UniProt to 50% maximum pairwise sequence identity or below. Furthermore, in metagenomics experiments typically large fractions of reads cannot be matched to any known sequence anymore because searching with sensitive but relatively slow tools (e.g. BLAST or HMMER3) through comprehensive databases such as UniProt is becoming too costly. RESULTS: MMseqs (Many-against-Many sequence searching) is a software suite for fast and deep clustering and searching of large datasets, such as UniProt, or 6-frame translated metagenomics sequencing reads. MMseqs contains three core modules: a fast and sensitive prefiltering module that sums up the scores of similar k-mers between query and target sequences, an SSE2- and multi-core-parallelized local alignment module, and a clustering module.In our homology detection benchmarks, MMseqs is much more sensitive and 4 to 30 times faster than UBLAST and RAPsearch, respectively, although it does not reach BLAST sensitivity yet. Using its cascaded clustering workflow, MMseqs can cluster large databases down to ~30% sequence identity at hundreds of times the speed of BLASTclust and much deeper than CD-HIT and USEARCH. MMseqs can also update a database clustering in linear instead of quadratic time. Its much improved sensitivity-speed trade-off should make MMseqs attractive for a wide range of large-scale sequence analysis tasks

MPG.PuRe

Fast and sensitive taxonomic assignment to metagenomic contigs

Author: Breitwieser F.
Levy Karin E.
Mirdita M.
Steinegger M.
Söding J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 18/03/2021
Field of study

MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig’s taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2–18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments

PubMed Central

MPG.PuRe

Protein sequence analysis using the MPI Bioinformatics Toolkit

Author: Alva V.
Gabler F.
Lupas A.
Mirdita M.
Nam S.
Steinegger M.
Söding J.
Till S.
Publication venue: 'Wiley'
Publication date: 01/12/2020
Field of study

The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) provides interactive access to a wide range of the best‐performing bioinformatics tools and databases, including the state‐of‐the‐art protein sequence comparison methods HHblits and HHpred. The Toolkit currently includes 35 external and in‐house tools, covering functionalities such as sequence similarity searching, prediction of sequence features, and sequence classification. Due to this breadth of functionality, the tight interconnection of its constituent tools, and its ease of use, the Toolkit has become an important resource for biomedical research and for teaching protein sequence analysis to students in the life sciences. In this article, we provide detailed information on utilizing the three most widely accessed tools within the Toolkit: HHpred for the detection of homologs, HHpred in conjunction with MODELLER for structure prediction and homology modeling, and CLANS for the visualization of relationships in large sequence datasets. Basic Protocol 1: Sequence similarity searching using HHpred Alternate Protocol: Pairwise sequence comparison using HHpred Support Protocol: Building a custom multiple sequence alignment using PSI‐BLAST and forwarding it as input to HHpred Basic Protocol 2: Calculation of homology models using HHpred and MODELLER Basic Protocol 3: Cluster analysis using CLAN

MPG.PuRe

HH-suite3 for fast remote homology detection and deep protein annotation.

Author: Haunsberger S.
Meier M.
Mirdita M.
Steinegger M.
Söding J.
Vöhringer H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2019
Field of study

BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS: We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite . CONCLUSION: The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects

RCSI Repository

MPG.PuRe

Cross-phyla protein annotation by structural prediction and alignment

Author: Arendt Detlev
Mirdita Milot
Musser Jacob M.
Papadopoulos Nikolaos
Ruperti Fabian
Steinegger Martin
Publication venue: BMC
Publication date: 14/05/2023
Field of study

Background Protein annotation is a major goal in molecular biology, yet experimentally determined knowledge is typically limited to a few model organisms. In non-model species, the sequence-based prediction of gene orthology can be used to infer protein identity; however, this approach loses predictive power at longer evolutionary distances. Here we propose a workflow for protein annotation using structural similarity, exploiting the fact that similar protein structures often reflect homology and are more conserved than protein sequences. Results We propose a workflow of openly available tools for the functional annotation of proteins via structural similarity (MorF: MorphologFinder) and use it to annotate the complete proteome of a sponge. Sponges are highly relevant for inferring the early history of animals, yet their proteomes remain sparsely annotated. MorF accurately predicts the functions of proteins with known homology in >90% cases and annotates an additional 50% of the proteome beyond standard sequence-based methods. We uncover new functions for sponge cell types, including extensive FGF, TGF, and Ephrin signaling in sponge epithelia, and redox metabolism and control in myopeptidocytes. Notably, we also annotate genes specific to the enigmatic sponge mesocytes, proposing they function to digest cell walls. Conclusions Our work demonstrates that structural similarity is a powerful approach that complements and extends sequence similarity searches to identify homologous proteins over long evolutionary distances. We anticipate this will be a powerful approach that boosts discovery in numerous -omics datasets, especially for non-model organisms

SNU Open Repository and Archive

Active region properties and irradiance variations

Author: de Toma
Domingo
Floyd
Fröhlich
Fröhlich
Fröhlich
Győri
Győri
Győri
Heath
Hudson
Judit M. Pap
Kretzschmar
Krivova
Norton
Ortiz
Pap
Pap
Pap
Pap
Preminger
Schmidt
Steinegger
Tünde Baranyi
Vigouroux
Wesolowski
Willson
Woods
Zahid
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Crossref

Repository of the Academy's Library

Bacterial microevolution and the Pangenome

Author: A Bankevich
AE Darling
AE Darling
AJ Page
AO Kislyuk
B Charlesworth
C Buckee
C Collins
C Wiuf
CM Thomas
CS Pepperell
DJ Wilson
DR Zerbino
E Jacox
F Lassalle
GE Sims
GJ Szollosi
GJ Szollosi
GJ Szollosi
H Ochman
IJ Wilson
J Hedge
J Lawrence
JB Joy
JFC Kingman
KAA Jolley
KE Dingle
KT Konstantinidis
L Li
L Petersen
M Csurös
M Nordborg
M Pagel
M Steinegger
M Touchon
M Vos
M Vos
M Vos
MJ Ward
MTG Holden
NA Rosenberg
NJ Croucher
P Donnelly
PAP Moran
R Griffiths
RC Griffiths
RG Everitt
RK Aziz
S Castillo-Ramírez
S Kurtz
S Wright
SF Altschul
SK Sheppard
SK Sheppard
SS Abby
SV Angiuoli
T Ohta
T Seemann
TG Vaughan
WP Maddison
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
Z Yang
Z Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

The comparison of multiple genome sequences sampled from a bacterial population reveals considerable diversity in both the core and the accessory parts of the pangenome. This diversity can be analysed in terms of microevolutionary events that took place since the genomes shared a common ancestor, especially deletion, duplication, and recombination. We review the basic modelling ingredients used implicitly or explicitly when performing such a pangenome analysis. In particular, we describe a basic neutral phylogenetic framework of bacterial pangenome microevolution, which is not incompatible with evaluating the role of natural selection. We survey the different ways in which pangenome data is summarised in order to be included in microevolutionary models, as well as the main methodological approaches that have been proposed to reconstruct pangenome microevolutionary history

Crossref

Warwick Research Archives Portal Repository

Willow Leaves' Extracts Contain Anti-Tumor Agents Effective against Three Cell Types

Author: Ahmed M. Aboul-Enein
E Steinegger
FB Hu
Floyd Romesberg
HA El-Shemy
Hany A. El-Shemy
J Metzner
JM Bennett
JT Cheng
Khalid Mostafa Aboul-Enein
KKS Bhatt
Kounosuke Fujita
L Bravo
M Freeman
M Gronbaek
M Jang
M Leven
M Wood
MS Kupchan
MV Clement
O Claudia
O Fiehn
R Mungur
S Renaud
SS Han
TC Hsieh
VW Adamkiewicz
WC Willett
X Gao
Y Matsumoto
Publication venue: Public Library of Science
Publication date: 31/01/2007
Field of study

Many higher plants contain novel metabolites with antimicrobial, antifungal and antiviral properties. However, in the developed world almost all clinically used chemotherapeutics have been produced by in vitro chemical synthesis. Exceptions, like taxol and vincristine, were structurally complex metabolites that were difficult to synthesize in vitro. Many non-natural, synthetic drugs cause severe side effects that were not acceptable except as treatments of last resort for terminal diseases such as cancer. The metabolites discovered in medicinal plants may avoid the side effect of synthetic drugs, because they must accumulate within living cells. The aim here was to test an aqueous extract from the young developing leaves of willow (Salix safsaf, Salicaceae) trees for activity against human carcinoma cells in vivo and in vitro. In vivo Ehrlich Ascites Carcinoma Cells (EACC) were injected into the intraperitoneal cavity of mice. The willow extract was fed via stomach tube. The (EACC) derived tumor growth was reduced by the willow extract and death was delayed (for 35 days). In vitro the willow extract could kill the majority (75%–80%) of abnormal cells among primary cells harvested from seven patients with acute lymphoblastic leukemia (ALL) and 13 with AML (acute myeloid leukemia). DNA fragmentation patterns within treated cells inferred targeted cell death by apoptosis had occurred. The metabolites within the willow extract may act as tumor inhibitors that promote apoptosis, cause DNA damage, and affect cell membranes and/or denature proteins

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central